In this exercise, we will be using functions from the
tidyverse package. You can see we’ve added the chunk option
message = FALSE to hide the version information that
tidyverse normally displays.
library(tidyverse)
Load the
afl_grand_finals.csvdataset we looked at previously.Make a scatterplot of
yearvswinner_score.Add a LOESS smoother or line of best fit.
afl_grand_finals <- read_csv("afl_grand_finals.csv")
ggplot(afl_grand_finals, aes(x = year, y = winner_score)) +
geom_point() +
geom_smooth(method = "loess")
Warning: Removed 3 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 3 rows containing missing values (`geom_point()`).
Here is a small dataset from the Victorian Electoral Commission, showing the number of lower house seats won by major parties in the 2018 state election:
election_data <- tribble(
~party, ~seats_won,
"Australian Greens", 3,
"Australian Labor Party", 55,
"Liberal", 21,
"The Nationals", 6,
"Other Candidates", 3
)
First make a bar chart of it. You will need to specify which column to place on which axis, and use
geom_col()instead ofgeom_bar(), since the data is already in summarised form (one observation per bar).
ggplot(election_data,
aes(x = seats_won, y = party)) +
geom_col()
Next make a line-and-dot chart, similar to the one demonstrated in the slides for Five principles of good graphics. (What do you think
geom_segment()does? Look it up in R’s help system to confirm your guess.)
ggplot(election_data,
aes(x = seats_won, y = party)) +
geom_point() +
geom_segment(aes(xend = 0, yend = party))
Take the scatterplot you made earlier and improve it to a standard you would be comfortable sharing with others.
- Add appropriate axis labels, a title, and a caption indicating what your line displays.
- Improve the x axis scale. ggplot usually provides sensible axis labels, but sometimes fails. In our case, the number “2020” is cut off slightly. You can control the x axis using
scale_x_continuous(). Some parameters you can experiment with includelimits = c(LOWNUMBER, HIGHNUMBER)(where LOWNUMBER and HIGHNUMBER are the ranges of the scale) andbreaks = seq(LOWNUMBER, HIGHNUMBER, by = INCREMENT)which sets where the tick marks go on the axs. (breaks =is new, but we sawseq()briefly back in section 1.2!)- Extend the y axis to include zero. (Is there any reason to do this here? Some people advocate all axes going to zero, but this is a complex issue that deserves consideration for each plot you make.)
- Apply your favourite theme.
- Change the colour of the points, smoother line, and 95% confidence interval for the smoother line by giving
colour =andfill =options togeom_point()andgeom_smooth(). You can use hexadecimal colour codes (like HTML), or one of the named colours here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf Colour names need to go inside quotes.- Extension: what happens if you swap the order of
geom_point()andgeom_smooth(), if the points are not black? Look carefully at the shaded confidence region.
ggplot(afl_grand_finals, aes(x = year, y = winner_score)) +
geom_smooth(method = "loess",
colour = "deepskyblue4", fill = "deepskyblue1") +
geom_point(colour = "firebrick4") +
scale_x_continuous(limits = c(1898, 2020),
breaks = seq(1900, 2020, by = 20)) +
scale_y_continuous(limits = c(0, 200)) +
labs(x = "Year", y = "Score of winning team",
title = "AFL Grand Final scores over time",
caption = "Blue trend line shows LOESS smoother.") +
theme_bw() +
theme(plot.title.position = "plot",
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank())
Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 4 rows containing missing values (`geom_point()`).
Pick your favourite of the plots of the Victorian election data, and get it to a standard you would be happy sharing with others.
Some suggestions:
- Sort the parties in order of number of seats won. Hint:
fct_reorder(party, seats_won, .desc = TRUE)should feature in your solution.- Add appropriate axis labels.
- Add party-appropriate colours. Hint: these are hexadecimal codes for the colours used by Australia’s major political parties:
c("#DE3533", "#0047AB", "#006644", "#10C25B", "#808080")in the order: ALP, Liberal, Nationals, Greens, Other. (sourced from Wikipedia!)- If you add colours, where should the legend go? Sometimes
theme(legend.position = "off")is the best place.- Add a title and a caption indicating the source of the data.
- Pick your favourite theme. Think about what gridlines are needed.
election_data_sorted <- election_data %>%
mutate(party = fct_reorder(party, seats_won, .desc = TRUE))
ggplot(election_data_sorted,
aes(x = seats_won,
y = party,
colour = party)) +
geom_segment(aes(xend = 0, yend = party)) +
geom_point() +
scale_x_continuous(expand = expansion(mult = c(0, 0.1))) +
scale_y_discrete(limits = rev) +
scale_colour_manual(values = c("#DE3533", "#0047AB",
"#006644", "#10C25B",
"#808080")) +
labs(x = "Number of seats won",
y = "Party",
title = "Victorian state election 2018 lower house results",
caption = "Data source: Victorian Electoral Commission") +
theme_minimal() +
theme(legend.position = "off",
plot.title.position = "plot",
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_blank())
© 2021 Statistical Consulting Centre, The University of Melbourne.